Fault Tolerance via Replication in Coarse Grain Data-Flow1

نویسندگان

  • Anh Nguyen-Tuong
  • Andrew S. Grimshaw
  • John F. Karpovich
چکیده

Recent advances in network technology promise to make gigabit-per-second bandwidth between remote hosts a reality in the near future. This increase in bandwidth paves the way for increased exploitation of distributed computing resources. Coupled with advances in distributed memory parallel compiler technology, there is strong reason to believe that wide-area distributed parallel processing will be an increasingly popular and important programming paradigm. Parallelizing and distributing program sub-tasks has the potential to increase performance for many applications while also improving the overall utilization of system resources. Unfortunately, there is a downside. When a program is partitioned into sub-tasks, each sub-task is distributed to potentially a different processor. As the number of processors employed by an application increases so does the chance that the application will fail due to a host/ processor failure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerance via Replication in Coarse Grain Data-Flow

Recent advances in network technology promise to make gigabit-per-second bandwidth between remote hosts a reality in the near future. This increase in bandwidth paves the way for increased exploitation of distributed computing resources. Coupled with advances in distributed memory parallel compiler technology, there is strong reason to believe that wide-area distributed parallel processing will...

متن کامل

AR-SMT: Coarse-Grain Time Redundancy for High Performance General Purpose Processors

Time redundancy is a fault tolerance technique in which a task -either computation or communication -is performed multiple times on the same hardware. This technique is cheaper than other fault tolerance solutions that require some form of hardware redundancy, because it does not require replicated hardware. However, fault coverage may be lower with time redundancy as it only captures certain c...

متن کامل

Fault Tolerant Wide-Area Parallel Computing

Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not compromise the high performance objective of parallel processing. In this paper, we explore two options for achieving fault tolerance for a common class of parallel applications, single-program-multiple-data (SPMD). We...

متن کامل

RAIDb: Redundant Array of Inexpensive Databases

Clusters of workstations become more and more popular to power data server applications such as large scale Web sites or e-Commerce applications. There has been much research on scaling the front tiers (web servers and application servers) using clusters, but databases usually remain on large dedicated SMP machines. In this paper, we address database performance scalability and high availabilit...

متن کامل

Robustness Analysis of Distributed Databases via Fault Injection

As the importance of reliable data storage increases, various techniques are applied to obtain the reliabilities. The usage of the assumed fault tolerant cloud providers increases, even though not a sufficient amount of research is performed on their fault tolerance. On cloud servers, a distributed Database Management System (DBMS) can be used to achieve safe data storage. Various DBMSs are mak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996